A study of Thompson Sampling with Parameter h
نویسنده
چکیده
Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.
منابع مشابه
Parameter Identifiability Issues in a Latent Ma- rkov Model for Misclassified Binary Responses
Medical researchers may be interested in disease processes that are not directly observable. Imperfect diagnostic tests may be used repeatedly to monitor the condition of a patient in the absence of a gold standard. We consider parameter identifiability and estimability in a Markov model for alternating binary longitudinal responses that may be misclassified. Exactly ...
متن کاملHorvitz-Thompson estimator of population mean under inverse sampling designs
Inverse sampling design is generally considered to be appropriate technique when the population is divided into two subpopulations, one of which contains only few units. In this paper, we derive the Horvitz-Thompson estimator for the population mean under inverse sampling designs, where subpopulation sizes are known. We then introduce an alternative unbiased estimator, corresponding to post-st...
متن کاملPrognostic Factors Affecting the Results of Modified Thompson Quadricepsplasty for the Treatment of Extension Contracture of the Knee
Background: Knee extension contracture is a disabling complication after fractures around the knee. In this study we aimed to study factors influencing the outcomes of quadricepsplasty for the treatment of traumatic knee extension contracture. We hypothesized that there is no factor influencing the final range of knee motion.Methods: In this retrospective study, we included 64 patients who unde...
متن کاملAsymptotically Optimal Algorithms for Budgeted Multiple Play Bandits
We study a variant of the multi-armed bandit problem with multiple plays in which the user wishes to sample the m out of k arms with the highest expected rewards, but at any given time can only sample ` ≤ m arms. When ` = m, Thompson sampling was recently shown to be asymptotically efficient. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our new setting whe...
متن کاملExperimental and Theoretical Study of Thompson Seedless Grapes Drying using Solar Evacuated Tube Collector with Force Convection Method
An evacuated tube solar collector drier is designed and developed to study analytically and experimentally drying kinetics of Thompson seedless grapes in Pune, India. Drying experiments are carried out in the month of April- June for continuous three years from 2013-2015. During the experimentation, temperatures of hot and cold air at various places, ambient relative humidity and humidity varia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.02174 شماره
صفحات -
تاریخ انتشار 2017